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Abstract 

For small cell technology to significantly increase the capacity of tower-based cellular networks, mobile users 
will need to be actively pushed onto the more lightly loaded tiers (corresponding to, e.g., pico and femtocells), 
even if they offer a lower instantaneous SINR than the macrocell base station (BS). Optimizing a function of the 
long-term rates for each user requires (in general) a massive utility maximization problem over all the SINRs and 
BS loads. On the other hand, an actual implementation will likely resort to a simple biasing approach where a 
BS in tier j is treated as having its SINR multiplied by a factor Aj > 1, which makes it appear more attractive 
than the heavily-loaded macrocell. This paper bridges the gap between these approaches through several physical 
relaxations of the network-wide association problem, whose solution is NP hard. We provide a low-complexity 
distributed algorithm that converges to a near-optimal solution with a theoretical performance guarantee, and we 
observe that simple per-tier biasing loses surprisingly little, if the bias values Aj are chosen carefully. Numerical 
results show a large (3.5x) throughput gain for cell-edge users and a 2x rate gain for median users relative to a 
maximizing received power association. 

I. Introduction 

To meet surging traffic demands, cellular networks are trending strongly towards increasing heterogene- 
ity, especially through proliferation of small BSs, e.g., picocells and femtocells, which differ primarily 
in terms of maximum transmit power, physical size, ease-of-deployment and cost [1-6]. Heterogeneous 
networks (HetNets) enable a more flexible, targeted and economical deployment of new infrastructure 
versus tower-mounted macro-only systems, which are very expensive to deploy and maintain [7]. Even 
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with a targeted deployment where these small BSs are placed in high-traffic zones, most users will still 
receive the strongest downlink signal from the tower-mounted macrocell BS. In order to make the most 
of the new low-power infrastructure, mobile users should be actively "pushed" onto the small BSs, which 
will often be lightly loaded and so can provide a higher rate over time by offering the mobile many more 
resource blocks than the macrocell. Similarly, a more balanced user association reduces the load on the 
macrocell, allowing it to better serve its remaining users. This paper investigates optimal and near-optimal 
solutions of this cell association problem, particularly those with simple requirements for coordination 
and side information. 

A. Related Work 

Most prior work on load balancing schemes applies mostly to macrocell- only networks. HetNets are 
much more sensitive to the cell association policy because of the massive disparities in cell sizes. These 
unequal cell sizes result in very unequal loads in a max-SINR cell association, assuming a relatively 
uniform mobile user distribution. That is, if users simply associate with the strongest BS, the difference 
in load in macrocell networks is constrained since the cells all have roughly the same coverage area. But 
in HetNets, the opposite is true, making the problem considerably more complex, and the potential gains 
from load-aware associations larger. 

The existing work on cell association can be broadly classified into two groups: 

1) Strategies based on channel borrowing from lightly-loaded cells, such as hybrid channel assignment 
(HCA) [8], channel borrowing without locking (CBWL) [9], load balancing with selective borrowing 
(LBSB) [10,11], etc; 

2) Strategies based on traffic transfer to lightly-loaded cells, such as directed retry [12], mobile- 
assisted call admission algorithms (MACA) [13], hierarchical macrocell overlay systems [14, 15], 
cell breathing techniques [16, 17], and biasing methods in HetNets [7]. 

The approach in this paper is based on traffic transfer. There have been many efforts in the literature 
toward traffic transfer strategies in macro-only cellular networks. The so-called "cell breathing" technique 
[16, 17] dynamically changes (contracts or expands) the coverage area depending on the load situation 
(over- loaded or under- loaded) of the cells by adjusting the transmit power. Sang et al. [18] proposed an 
integrated framework consisting of MAC-layer cell breathing and load-aware handover/cell- site selection. 
Cell breathing aims to balance the load among neighboring macrocells, while in HetNets we additionally 
need to balance the load among different tiers. 
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A popular approach in conventional networks, related to the direction we propose, is to achieve 
load balancing by changing the objective function to be concave. Indeed, there is considerable work 
investigating different utility functions, such as network- wide proportional fairness [19], network- wide 
max-min fairness [20], maximization of network- wide aggregate utility by partial frequency reuse and 
load balancing [21], and a-optimal user association [22]. We adopt the logarithmic function as the utility 
function, which is similar to proportional fairness, and achieves a desirable tradeoff between opportunism 
and fair allocation across users, by saturating the reward for providing more rate to users which already 
have a high rate. 

In HetNets, there are a few recent investigations of the cell association problem. A joint optimization of 
channel selection, user association and power control in HetNets is considered in [23], aiming to minimize 
the potential delay, which is related to the sum of the inverse of the per-user SINRs, where the SINR 
takes into account the load when computing the interference. Corroy et al. [24] propose a dynamic cell 
association to maximize sum rate as well as a heuristic cell range expansion algorithm for load balancing. 
Cell range expansion is an effective method to balance the load among high and low power BSs, which 
is enabled through cell biasing [7,25]. It is achieved by performing user association based on the biased 
measured signal, which leads to better load balancing, but the improvement of load balancing may not 
overwhelm the degradation in SINR that certain users suffer. Therefore, how to design the biasing factor 
is an important open problem. 

B. Contributions and Organization 

We present a load-aware cell association method and distributed algorithm for downlink HetNets, that 
results in the following main contributions. 

First, in Section III, we undertake an optimization theoretic approach to the load-balancing problem, 
where we consider cell association and resource allocation jointly. We decouple the joint general utility 
maximization problem by assuming (optimistically) that users can be associated with more than one BS. 
This approach provides an upper bound on achievable network utility which can serve as a benchmark. 
However, in real system, it is much more difficult to implement multi-BS association than single-BS 
association. Therefore, we formulate a logarithmic utility maximization problem for single-BS association, 
and show that equal resource allocation is actually optimal, over a sufficiently large time window. This 
observation allows the coupled problem in single-BS association to reduce to the cell association problem 
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with equal resource allocation, which along with the fractional association assumption converts the 
previously intractable combinatorial problem into a convex optimization problem. 

In Section IV, we exploit the convexity of the problem to develop a distributed algorithm via dual 
decomposition that converges towards the optimal solution with a guarantee on the maximum gap from 
optimality. This provides a feasible, efficient and low-overhead algorithm for implementation in HetNets. 

In Section V, we leverage our provably optimal solutions to ask a basic question: how much of the 
performance gain can a simple policy based on a priori bias factors achieve? Our results show that this 
simple approach gets surprisingly close to the gains of the load-aware utility maximization. The gains 
from this approach are shown to be very large for most users in the system, with rate gains ranging from 
2-3.5x for the bottom half of users. To put this in context, this is a gain on par with what would otherwise 
be achieved by a doubling or tripling the amount of spectrum for a given service provider. Cell interior 
users experience little to no rate gain (or a small loss), but this has little relevance in practice since such 
users are already well-served. 

II. System Model 

In downlink (DL) cellular networks, the default association scheme is max-SINR, which indeed maxi- 
mizes the probability of coverage, i.e., P(SINR > (3), where (3 is a target SINR (or equivalently minimizes 
the probability of outage, i.e., P(SINR < /?)). In conventional networks, the default association scheme 
in uplink (UL) is typically the same as the association in DL, since the coverage areas are almost the 
same among different macrocells. However, this is not the case in HetNets, since the BSs of different 
tiers have such widely divergent transmit powers. Rather, the DL coverage area of macro BSs is much 
larger than that of smaller BSs. If we adopt the same association in UL, the cell-edge macro-users will 
cause great interference to nearby users, especially for users which are associated to nearby small cells. 
Furthermore, the key metric for performance is their service rate, not SINR. The instantaneous rate is of 
course directly related to SINR (e.g., log 2 (l + SINR)), but the overall served rate is then multiplied by 
the fraction of resources that user gets. Hence, heavily-loaded cells provide lower rate over time, even if 
they provide a higher SINR. Load balancing problem is very important in both DL and UL HetNets. 

In this paper, we focus on DL cell association. UL could likely be considered through a similar approach, 
but is complicated by the use of UL power control, which changes the interference depending on the 
association. Here, we assume that all BSs have full buffers and slowly changing (or constant) transmit 
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power, which means that BSs are always sending at a fixed transmit power over the association time scale, 
and thus the interference power is a constant which is independent of the specific association. 

A downlink HetNet consisting of i^-tiers of BSs is illustrated in Fig. 1 with K = 3. Each tier models 
a particular type of BS: for example, tier 1 consists of traditional macrocells, and tier 2 and tier 3 could 
be interpreted as being comprised of picocells and femtocells, respectively. Picocells transmit at a lower 
power with a higher deployed density than the macrocells, while the femtocells, or the home BSs, may 
eventually be deployed very densely but have a very small transmit power. 




Fig. 1. Illustration of a three-tier heterogeneous cellular network. Only a single macro-cell is shown for simplicity. 

We denote by B the set of all BSs, and U the set of all users. During the connection period, we denote 
by 0^ the achievable rate, where generally, Cij IS 3. logarithmic function of SINR. 

°ij = /(SINRjj) = /( — ), (1) 

where Pj is the transmit power of BS j, gij denotes the channel gain between user i and BS j, which 
in general includes path loss, shadowing and antenna gain, and a 2 denotes the noise power level. The 
association is carried out in a large time scale compared to the change of channel. The SINR for association 
is averaged over the association time and thus it is a constant regardless of the dynamics of channels. As 
for resource allocation, we assume that resource allocation is carried out well during the channel coherence 
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time, and thus channel can be regarded as static during each resource allocation period. This model is 
applicable for low mobility environment. We leave the stochastic channel analysis as our future work. 
Note that though this paper is focus on single-carrier system, our model can be extended to multi-carrier 
system in a straightforward manner (i.e., let c^- be the average rate over different bands). 

Since each BS generally serves more than one user, users of the same BS need to share resources such 
as time and frequency slots. The long-term service rate experienced by a user thus depends on the load of 
the BS and will therefore be only a fraction of the value (unless BS j exclusively serves user i). We 
assume that users will keep transmitting during an association time scale (i.e., users have full buffers), so 
the load on a BS is directly proportional to the total number of users associated with it. 

Moreover, the overall service rate also depends on the resource allocation method of the BSs. In 
principle, any allocation method or service discipline with which the resource allocation is related to both 
the load of BSs and the rate of each user can be used. Therefore, the achievable overall rate of user % 
associated with BS j depends on Cy,c ? j, and how BS j distributes its resources among its associated 
users, for all q e U \ i. We focus on finding an optimal resource allocation and optimal cell associations 
which maximize the utility. During the connection between the BS j and user i, denoting the fraction of 
resource BS serves user % by y^, we can define the overall long term rate as follows. 

Definition 1. If user i is associated with BS j, the overall long term rate is 

Rij UijCiji (2) 

where YliVij = 1> ^3- We denote the total overall rate of user i as Ri, where Ri = J2jRij- 

In the following, we investigate a utility maximization problem for the overall rate Ri to find the optimal 
association and resource allocation. 

III. Problem Formulation 

Taking a utility function perspective, we assume user % obtains utility Ui(Ri) when receiving rate is Ri, 
where the function [/*(•) is a continuously differentiable, monotonically increasing, and strictly concave 
utility function [26]. 

A. General Utility Maximization: Unique Association 

We formulate an optimization problem which involves finding the indicators {:%} corresponding to the 
association (i.e., = 1 when user % is associated with BS j, otherwise Xij = 0) and {yij} corresponding 
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to the resource allocation that maximizes the aggregate utility function: 



max 



s.t. 



2_^ x ij = 1? Vi G W 

jeB 



(3) 



< yij < Xij, Xij e {0, 1} Mi e U,Vj e B. 



B. General Utility Maximization: Allowing Joint Association 

The indicator variable enforces unique association, which is combinatorial. Moreover, the cell 
association has to be considered jointly with resource allocation, because resource allocation depends 
on the association and user association depends on the achievable resource for each user. Therefore, the 
resulting problem is difficult to solve. While allowing a user to be served by multiple BSs may require 
more overhead to implement, and hence perhaps may not be viable in practice, it provides an upper bound 
on the network performance. In this section, we make the following assumption: 

Assumption 1. We assume that users can be associated with more than one BS at the same time. 

Under this assumption, the constraint £\ x^ = 1 can be eliminated, and hence there is no need for 
x^ as additional indicators for cell association. The resource allocation variable G [0, 1] indicates the 
association, i.e., user % is associated with BS j when > 0, otherwise they are not connected. 

Therefore, we focus only on investigation of how the resource should be allocated to different users with 
different rate so as to maximize the utility, instead of considering in conjunction with cell association. 

We formulate the joint association problem as follows: 



Note that this joint association scheme focuses on how to allocate resource for each BS, rather than 
how to associate users. In the following sections, we show that with some specific utility functions (e.g., 
logarithmic utility), y^ can be directly found without Assumption 1 and thus there is no need to decouple 
Xij and yij as in this optimization. However, problem (4) provides an ultimate limit on achievable network 



max 

y 




s.t. 




(4) 



< y^ < l,Vi G W,Vj G B. 
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performance for general utility maximizations. Interestingly, our simulation results show that the bound 
is quite tight in logarithmic utility maximization. 



C. Logarithmic Utility Formulation 

Using linear utility functions for throughput maximization results in a trivial solution, where each 
BS serves only its strongest user. While throughput-optimal, this is not a satisfactory solution for many 
reasons. Instead, we seek a utility for that naturally achieves load balancing, and some level of fairness 
among the users. To accomplish this, we use a logarithmic utility function. The resulting objective function 
with logarithmic utility is 



This logarithm is concave, and hence has diminishing returns. This property encourage load balancing. 
This is consistent with the resource allocation philosophy in real systems, where allocating more resources 
for a well-served user is considered low priority, whereas providing more resources to users with low 
rates (e.g., in the linear region of the logarithmic function) is considered desirable. Logarithmic function 
in particular is a very common choice of utility function. Therefore, in the remainder of this paper, we 
use a logarithmic utility function. 

D. Analysis of Optimized Resource Allocation 

For general utility functions, we proposed one possible tractable model for the joint cell association and 
resource allocation problem in Set. III-B, which allows users to be served by multiple BSs. In practice, this 
is much more difficult to implement than single-BS association. Therefore, we consider it as a benchmark 
in this paper, providing an upper bound on network performance. In the remainder of this paper, we focus 
on the log utility function in single-BS association. 

In single-BS association, the objective function of (3) becomes 



Then, we conduct the resource allocation analysis on a typical BS j and the users associated with that 




(5) 





(6) 



j£B ie{i|xi J -=l} 



BS. The utility maximization problem for the users associated with BS j is 



max ^ lo&iUijCij) 



y 

ie{i\xij=i} 

s.t. $^-<l, ( 7 ) 
< yij < 1 Vi G W. 

Definition 2. We define the effective load of BS Kj as the number of users associated with it, i.e., 
Kj = x kj, where is the association indicator. 

keU 

The optimization (7) suggests the following proposition. 

Proposition 1. The optimal resource allocation is equal allocation, i.e., yij = l/Kj. 

Proof: See Appendix A. ■ 
In this paper, we assume the channel is static during each resource allocation period. Note that in 
stochastic setting which takes into account time-varying channels and user mobility, this proposition 
becomes exactly proportional fair scheduling (i.e., to maximize the log utility in terms of long-term 
average throughput). 

Therefore, using the logarithmic utility function, the resource allocation is quite simple, and is indepen- 
dent of the distribution of SINR, which makes the joint cell association and resource allocation problem 
tractable. In particular, the optimal allocation is uniform across users served by that BS. 

Given this equal resource allocation, the long-term rate for user i from BS j is 

R ij = 17-i ( 8 ) 
i 

so we can rewrite the optimization (3) as 



ielJ iaK kj / 



max 

s.t. ^2xij = l, \/ieU, (9) 
jeB 

x^ G {0, 1}, Vi G U, and Vj G B. 

When the network is small, the optimal user association can be found through a brute force search. As 
an illustrative example, Fig. 2 compares the resulting association patterns of max-SINR vs. the proposed 
load-aware association scheme in (5). In Fig. 2(a), max-SINR associates many users with macro BS 1, 
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which overloads it; on the other hand, many of the small BSs serve very few users with some even 
being idle. Fig. 2(b) shows the load-aware association, which moves traffic off congested macrocells and 
onto more lightly loaded small cells. Note that admission control carries out a similar task, where new 
arrival users will be blocked or forced to other lightly loaded BSs when the potential BS is heavily loaded. 
However, admission control is performed before a connection is established (i.e., only for new users rather 
than existing users), and thus cannot achieve an optimal association in terms of load balancing. 




100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 

(a) max-SINR association (b) Fractional-rounding methods with max-sum-log objective 

Fig. 2. Different associations in HetNets. The load-aware scheme leads to more efficient resource utilization, by handover of load to 
underutilized small BSs 

E. Relaxation to Fractional User Association 

The above problem is combinatorial due to the binary variable Xij. The complexity of the brute force 
algorithm is Q((N B ) Nu ), where N B and Nu denote the number of BSs and number of users, respectively. 
The computation is essentially impossible for even a modest-sized cellular network. To overcome this, 
we again invoke Assumption 1 to allow users to be associated to more than one BSs, i.e., "fractional user 
association" (FUA). This physical relaxation reduces the complexity which is no longer combinatorial, 
and upper bounds the special case where each user is associated with just one BS. It is more difficult to 
implement multiple-BS association than single-BS association in a practical system, and thus we adopt a 
rounding method to revert to single BS association (9). 

Note that the upper bound provided by FUA is different from that provided by joint association with 
a general utility function: FUA upper bounds the performance of the cell association problem with equal 
resource allocation, while the joint association problem on Section III-B provides an upper bound without 
any restriction on resource allocation. Nevertheless, the numerical results in Section VI show that there 
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is almost no loss after rounding, and the upper bound provided by FUA is quite tight. 

With FUA, the indicators can take on any real value in [0, 1]. The following relaxation of (3) is 
convex: 

max EE^ lo 4^- 

s.t. ^a- = l, MieU, (10) 

jeB 

< Xij < 1, Vi e U, and Vj e £. 

To directly solve the convex optimization (10), global network information is necessary, which requires 
a centralized controller for user association and coordination. In the following section, we propose a 
distributed algorithm without coordination. 

IV. Primal-Dual Distributed Algorithm 

The centralized functionality for solving the convex optimization problem is usually implemented by a 
server in the core network for macrocells (e.g., Radio Network Controller (RNC) which carries out resource 
management in UMTS), by only allowing slow adaptation at relatively long timescales and requirement 
of coordination among different tiers. Additional issues with centralized mechanisms include excessive 
computational complexity and low reliability, as any crash on the centralized controller operation will 
disrupt load balancing. In HetNets, it is usually difficult to coordinate macrocellls and femtocells which 
are deployed by operators and users respectively. Therefore, a low complexity distributed algorithm without 
coordination is desirable. 

In this section, we propose a distributed algorithm via Lagrangian dual decomposition [27]. The dual 
problem of (10) is decoupled into two sub-problems, which can be solved separately on users' side and 
BSs' side respectively. 

A. Dual Decomposition 

The primal formulation in (10) can be expressed in an equivalent form by introducing a new set of 
variables, the load metric Kj = Yl x ij- 
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max Y Y Xi i log ( Ci i ) ~ Y K 3 lo S ) 



s.t. 



E 
E 



Vi e C/ 



(11) 



Xy, > 0, Vi G C/, and Vj G 73, 

where the redundant constraint Kj < N v is added for the analysis of convergence of the following 
distributed algorithm, which is further explained in the proof of Theorem 1. 

The only coupling constraint is J2i x ij = Kj i n problem (11). This motivates us to turn to the Lagrangian 
dual decomposition method whereby a Lagrange multiplier /i is introduced to relax the coupled constraint. 
The dual problem is thus: 

D: min £>(//) = fM + g K ((i), (12) 



where 



max Y Y x v ( lo s( c y) - n) 

s.t. ^^^^^ x ij — X 



(13) 



< < 1 



g(fj.) = max ^ Kj - log(Kj)) . 

j 



(14) 



When the optimal value of (1 1) and (12) is the same, we say that strong duality holds. Slater's condition 
is one of the simple constraint qualifications under which strong duality holds. The constraints in (11) are 
all linear equalities and inequalities, and thus the Slater condition reduces to feasibility [28]. Therefore, 
the primal problem (11) can be equivalently solved by the dual problem (12). Denoting Xij(fj) as the 
maximizer of the first sub-problem (13) and Kj(/j,) as the maximizer of the second sub-problem (14). 
There exits a dual optimal fx* such that x(/jl*) and K(/j,*) are the primal optimal. Therefore, given the 
dual optimal //*, we can get the primal optimal solution by solving the decoupled inner maximization 
problems (13) and (14) separately without coordination among the users and BSs. 
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B. The Distributed Algorithm 

The outer problem is solved by the gradient projection method [29], where the Lagrange multiplier \x 
is updated in the opposite direction to the gradient V-D(/x). Evaluating the gradient of the dual objective 
function requires us to solve the inner maximization problem, which has been decomposed into two sub- 
problems / and g. These two sub-problems can be solved in a distributed manner. The tth iteration of 
gradient projection algorithm is given as follows: 

1) User's Algorithm: 

1) Each user measures the SINR by using pilot signals from all BSs, and receives the value of fij 
broadcast by each BS at the beginning of the iteration. 

ii) User % determines BS j* which satisfies the follows: 

j* = argmax (log(c ij ) - //,(£)) . (15) 

3 

If there are multiple maximizers, user will choose any one of them. 

2) BSs' Algorithm: 

Each BS updates the new value of Kj and p,j in two steps and announces the new multiplier fij to the 
system. 

i) To obtain the maximizer of problem (14), we set its gradient to be with the constraint Kj < N v , 
i.e., 

Kj(t + 1) = mm{Nu, e^ (t)_1) }. (16) 

ii) The new value of the Lagrange multiplier is updated by 

Hit + 1) = H(t) - S(t) ■ ^(t) - 5> -(f) j , (17) 

where 5(t) > is a dynamically chosen stepsize sequence based on some suitable estimates. 
There is a nice interpretation of fx. The multiplier // works as a message between users and BSs in 
the system. In fact, it can be interpreted as the price of the BSs determined by the load situation, which 
can be either positive or negative. If we interpret as the serving demand for BS j and Kj as the 

i 

service the BS j can provide, then fij is the bridge between demand and supply, and Eq. (17) is indeed 
consistent with the law of supply and demand: if the demand J2 x ij f° r BS j exceeds the supply Kj, the 

i 

price pLj will go up; otherwise, the price /ij will decrease. Thus, when the BS j is over-loaded, it will 
increase its price jij and fewer users will associate with it, while other under- loaded BSs will decrease 
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the price so as to attract more users. Moreover, the function of p, (15) in distributed algorithm motivates 
rate bias scheme, which is discussed in Section V. 

Given Xij(p) and Kj(p), the adjustment (17) can be made completely distributed among BSs based 
on only local information. At each iteration, the complexity of the distributed algorithm is 0(N B Nu). 
As for the exchanged information, at each iteration each BS broadcasts its pj which is a relatively small 
real number, and each user reports its association request to only one BS which it wants to connect to. 
The amount of information to be exchanged in the distributed algorithm is k(N B + N v ), where k is 
the number of iterations, while in the centralized method it is proportional to (N B x N v ). The gradient 
method converges fast generally, especially with the dynamic stepsize proposed in Sec. IV-C, and thus k is 
a small number (less than 20 in simulation). Therefore, even with the requirement of multiple exchanges 
of information, distributed algorithms may be superior for some cases, such as large scale problems. It is 
applicable as long as the convergence of distributed algorithm is faster than the association period. After 
iteratively performing the above steps, the algorithm is guaranteed to converge to a near-optimal solution. 
This is proved in the next subsection. 

C. Step Size and Convergence 

Suppose the stepsize dynamically updates according to the rule 

where D(t) is an estimate of the optimal value D* of problem (12), 7 and 7 are some scalars [30]. We 
consider a procedure for updating D(t), whereby D(t) is given by 

D(t) = min D(u(t)) - e(t), (19) 

0<T<t 

and e(t) is updated according to 

f pe(t), if D(p(t + 1)) < D(p(t)), 

e(t + l)=\ (20) 

( max{/35(t),e}, if D(p,{t + 1)) > D((j,(t)), 

where e, f3 and p are fixed positive constants with f3 < 1 and p > 1 [30]. 

Thus in this procedure, we want to reach to a target level D(t) that is smaller by e(t) over the best 
value achieved. Whenever the target level is achieved, we increase e(t) (i.e., p > 1) or we keep it at the 
same value (i.e., p = 1). If the target level is not attained at a given iteration, e(t) is reduced up to a 
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threshold e, which guarantees that the stepsize 8(t) (18) is bounded away from zero. As a result, we have 
the following theorem. 

Theorem 1. Assume that the stepsize S(t) is updated by the dynamic stepsize rule (18) with the adjustment 
procedure (19) and (20). If D* > — oo where D* denotes the optimal value, then 

irif D(n(ij) < D* + e. (21) 
Proof: The derivative of function D(fi) (12) is given by 

d D 

^)=K 3 ^)-Y J ^M- (22) 

In our primal problem, Kj = ^\ Xij < Njj where N v is the total number of users. According to (22), 
when Kj and J2i x ij 3X6 bounded, the subgradient of dual objective function dD is also bounded: 

sup{||d£>(Mf))||}<c, (23) 

t 

where c is some scalar. Thus, our problem satisfies the necessary conditions of Proposition 6.3.6 in [30]. 
By applying this proposition, the theorem is proved. ■ 

V. Range Expansion (Biasing) 

The proposed approaches above is sensitive to the deployment of users and BSs, i.e., the algorithms 
have to run again and again in order to keep tracking of changes in networks. In this section, we 
investigate a simple approach, called range expansion, which is insensitive to the change of deployments. 
Range expansion is proposed a practical way to balance loads in HetNets, since it allows for a simple 
uncoordinated decision based only on the received power from a given BS [7,31]. It is implemented by 
assigning a multiplicative SINR bias to each tier of BSs (depending primarily on their transmit power). 
For example, if a picocell had a 10 dB SINR bias vs. the macrocell BS, a user would associate with it until 
the SINR delivered by the macro BS was a full 10 dB higher than the picocell. This can be performed 
by measuring the pilot signals from the BSs within radio range and then simply associating with the one 
that has the highest biased received power. In this section we investigate whether this simple approach is 
compatible with the optimal load-aware problems formulated in the prior two sections. 

There are some recent studies on the SINR bias [25,32], but have not given any theoretical guidance 
on the "best" biasing factors in the sense of load balancing and/or achieving some optimization criteria. In 
this section, we evaluate the range expansion that our optimal user association scheme provides in terms 
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of SINR bias. Moreover, the distributed algorithm inspires a rate bias scheme where the biasing factor is 
multiplied with the rate instead of SINR. The best SINR biasing factor is obtained by a brute force search 
based on the optimal FUA, and the best rate biasing factor is derived directly from the optimal /i* in the 
dual distributed algorithm. The network-wide performance with either biasing factor gets pretty close to 
the optimal FUA, among which rate bias performs better than SINR bias. A more interesting observation 
is that the biasing factors are insensitive to the location of BSs and users, which makes the bias schemes 
simple and robust to implement in practice. 

A. SINR Bias 

We first consider the SINR bias, where users are associated with the BS which provides the highest 
biased SINR. 

Definition 3. Given the biasing factor Aj for BS j, we define the biased SINR received by user i from 
BS j as 

keB,kft 

We adopt an identical biasing factor for all BSs in the same tier [7,25,32]. Note that setting the 
biasing factors at all tiers to 1 reduces to the conventional max-SINR cell association, and setting them 
to Aj = l/Pj associates users to the BS with the lowest path loss. Biasing under-loaded small BSs, the 
cells extend the coverage and attract more users, thus resulting in a more fair distribution of traffic. In 
simulation, the biasing factor is quite stable as the change of BS density and transmit power, and the 
performance by SINR bias is very close to the optimal performance by FUA, which is further discussed 
in Section VI. 

B. Rate Bias 

According to our load aware association schemes, the best SINR biasing factors are obtained by a brute 
force search with high complexity. The solution (15) of the dual distributed algorithm motivates the more 

tractable idea of rate bias. According to (15), user % is associated with BS j*, where j* = argmax^e - ^) 

j 

for optimal association. Therefore, by setting the rate biasing factor to Bj = e~^, the association would 
be exactly same as the association obtained by the distributed algorithm, which is a near-optimal solution. 

Definition 4. We define the biased rate Cij of user i from BS j as 
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c tJ = Cij ■ B r (25) 

Through range expansion, users are associated with the BS that serves the maximum biased rate c^-. In 
rate bias, the biasing factor is in the exponential term of SINR (i.e., (1 + SINRy)^)* which is different 
from SINR bias where the biasing factor is multiplied directly to SINR (i.e., AjSINRy). 

In the distributed algorithm, the price variable is different from BS to BS, even for those belonging to 
the same tier. However, in the investigation of range expansion, just as for SINR bias, we use the same 
biasing factor for all BSs in a given tier, which is the mean of the optimal multiplier, i.e., Bj = E[e~^i], 
where / e jth tier. The results of rate bias shown in next section is very close to optimal solution in FUA. 

VI. Performance Evaluation 

We consider a three-tier HetNet with transmit power {Pi, P 2 , P3} = {46, 35, 20}dBm. The theoretical 
analysis throughout this paper is independent of the spatial distribution of the BSs. For the simulations, we 
model the locations of the macro BSs to be fixed, and the locations of the small BSs to be uniformly and 
independently distributed in space. This corresponds to operator deployed macros/picocells, and customer- 
placed femtocells. We model the location processes across different tiers as independent, with deployed 
density {A 2 , A 3 } = {5,20} per macrocell. In modelling the propagation environment, we use a path loss 
L(d) = 34 + 401og(d) and L(d) = 37 + 301og(rf) for macros/picocells and femtocells respectively. We 
assume lognormal shadowing with a standard deviation a s = 8dB. At room temperature and bandwidth 
10MHz, the thermal noise power is a 2 = kTB = — 104dBm. We then assume that during the connection 
period between user % and BS j, the user achieves the Shannon capacity rate, i.e., = log 2 (l + SINRy). 

A. Loads among different BSs 

Fig. 3 compares the load situations among different association schemes. The max-SINR association 
results in very unbalanced loads: the macro BSs are over-loaded, while small BSs serve far fewer users, 
with some even being idle. In the fractional association scheme, the load is shifted to the less congested 
small BSs, which suggests that our objective alleviates the asymmetric load problem. The results after 
rounding are almost the same as the global optimum obtained by fractional association, showing the 
effectiveness of the rounding scheme. This occurs because there are few users associated with more 
than one BSs: most users are not "fractional". Moreover, the fractional users usually have a strong 
preference towards one of the BSs. The dual distributed algorithm and biasing also provide near-optimal 
load distributions with low complexity. 
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Fig. 3. Comparisons of average number of users per tier in a three-tier HetNets. Compared to max-SINR, the load is shifted from over-loaded 
Macro BSs to the less congested small BSs in fractional association. Rounding method and biasing schemes obtain very close results to 
optimal results in fractional association. 

B. Rate CDF 

As another performance measure, Fig. 4 and 5 show the cumulative distribution function (CDF) of 
long-term rate in HetNets and conventional networks with different association schemes respectively. In 
HetNets, the CDFs for joint association, fraction-rounding, the dual distributed algorithm and biasing 
all improve significantly at low rate vs. max-SINR, showing a 2-3. 5x gain, in both static setting and 
stochastic setting. The CDF of max-SINR catches up at a rate of 0.3 bits/s/Hz, since load balancing 
provides a more uniform user experience by taking resources from strong users. However, load balancing 
enables the system to accommodate more users, which will boost the system revenue in general. The 
CDFs of fractional-rounding and the dual distributed algorithm almost overlap, which verifies that the 
distributed algorithm converges to a near-optimal solution. The result of joint association is very close 
to the result of fraction-rounding, which verifies Prop. 1. Note that in stochastic setting, we adopt PF 
as the scheduling scheme. The static channel equals the average of stochastic channel. From fig. 4, we 
can see that the rate in stochastic setting with PF is larger than the rate in static setting, although by PF, 
the resource allocation will eventually converge to almost equal allocation for each user. This is because 
the channel distribution would be changed by PF (users are more possible to be served in good channel 
status, i.e., the cij would be larger than the average rate defined in this paper). Fig. 5 shows that the rate 
gain is unique for HetNets as long as the users are uniformly distributed. 
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Fig. 4. The CDFs of overall rate in a three-tier HetNets, in both static setting and stochastic setting. The biasing factors of macro BSs, 
picos and femtos are {Ai,A2,A 3 } = {1.00, 4.00, 11.9} in SINR bias, and {B 1 ,B 2 , B 3 } = {1.00, 1.59, 1.88} in rate bias, respectively. 




Rate (bit/s/Hz) 



Fig. 5. The CDFs of overall rate in macro-only networks. 



The ratio of rate a vs. probability F(R < a) for the various approaches vs. max-SINR is represented 
in Fig. 6. The rate gain is quite large (e.g., 3.5x vs. max-SINR at the 10% rate point). The results for 
simple biasing are very close to the optimum associations, where the empirically observed biasing factors 
of macrocells, picocells and femtocells turned out to be {Ai, A 2 , A 3 } = {0,6,10.8} dB in SINR, and 
{Bi, B 2 , B 3 } = {1.00, 1.59, 1.88} in rate (linear units) respectively, for the chosen parameters. 
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C. Biasing Factor 

The effect of BS density and transmit power on biasing factors is considered in Fig. 7 and 8 respectively. 
The biasing factors have been normalized, which means no biasing for the macrocell tier. 

When the deployed density of small BSs changes, it is interesting to observe in Fig. 7(a) and 7(b) 
that deploying more small BSs has very little effect on the biasing factor. Intuitively, though the density 
of BSs increases, within a reasonable change range of density, there are more users associated with that 
type of BSs in the optimal association, which makes the needed range expansion almost the same as that 
in the original scenario. Therefore, the optimal biasing factors will be almost the same as the network 
infrastructure deployment evolves. 

However, the story is quite different when the transmit power changes. As power of 2nd-tier BSs 
increase in Fig 8(a), the biasing factor of 2nd-tier BSs steadily decreases, while the biasing factor of 3rd- 
tier BSs almost stays the same. A similar conclusion can be obtained from Fig. 8(b), where the biasing 
factor of 3rd-tier BSs decreases gradually and the biasing factor of 2nd-tier BSs is almost static. The 
biasing factor is smaller as the transmit power increases because users are more likely to be associated 
with these BSs using max-biased SINR even without a strong bias. 




5 1 i i i i i i i i I 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Probability P(R<a) 

Fig. 6. Rate gain in a three-tier HetNet. The rate ratio of joint association scheme, fractional-rounding scheme, dual distributed algorithm, 
and biasing schemes to max-SINR is represented. There is a very large gain for the bottom half of users, i.e., cell edge users. 
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Fig. 7. Biasing factors vs. density of small BSs in a three-tier HetNet. The density of one tier changes while the others are fixed, with tier 
1 always having biasing factor 1. Deploying more small BSs has almost no effect on biasing. 
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Fig. 8. Biasing factors vs. transmit power of BSs in a three-tier HetNet, with tier 1 having biasing factor 1. 



VII. Conclusion 

In this paper, we propose a class of novel user association schemes that achieve load balancing in 
HetNets through a network-wide utility maximization problem. We first consider the cell association and 
resource allocation jointly, and propose an upper bound on performance. Then we formulate a logarithmic 
utility maximization problem where the equal resource allocation is optimal, and design a distributed 
algorithm via dual decomposition, from the relaxation of physical constraints. The distributed algorithm 
is proved to converge to a near-optimal solution, with low complexity that is linear to the number of 
users and the number of BSs. Finally, our scheme is extended to the range expansion technique, which 
requires limited changes to the existing system architecture by introducing biasing factors to small BSs. 
We consider two types of biasing factors (SINR and rate), and evaluate the effects of BSs' density and 
transmit power on the biasing factors by using our load-aware association scheme. 
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A key observation is that the optimal biasing factors are nearly independent of the BS densities for the 
various tiers, but highly dependent on the per-tier transmit powers. With these optimal biasing factors, the 
network nearly achieves the optimal load-aware performance. The numerical results demonstrate that a 
load-aware association significantly improves resource utilization and mitigates the congestion of macro 
BSs, resulting in a multi-fold gain to the overall rate for most users, particularly those with previously low 
rates. Future work could include load-aware user associations that incorporate dynamic settings (dynamic 
traffic, high user mobility), include providing a analytical framework of biasing factors and finding the 
optimal value theoretically, include the uplink scenario with power control, and the consideration of 
additional utility functions. 

Appendix A 
Proof of Proposition 1 

The objective function of (4) is 

max ^2 l°g(yij c ij)= lo g( c *i) + lo g(^i)> ( 26 ) 

ie{i\xij=i} ie{i\xij=i} 

where ^ log(cy) is constant relative to SINRy. Then the objective function is equivalent to maximizing 
the geometric mean: 

1 ( u \ 

max V logiyij) <=> max — log TT y tj <^> max ^/y^y 2 j • • -yjv„j, (27) 
y A — ' y iv,. \ / v 

ie{i\xij=i} u \ i / 

where Ny denotes the number of users associated with BS j. As the geometric mean is no greater than 
the arithmetic mean, we have ^/yijyYj • • • UN u j < yi 3 +y2 ^"' +m ^ ; where the equality holds if and only if 
Uij — V2j — • ■ ■ — UN u j- Therefore, to maximize (26), should be equal for all i, i.e., y^ = l/Kj. 
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